local search
Improving Decision Trees through the Lens of Parameterized Local Search
Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by performing a fixed number of a single type of these operations. Although we discover that the corresponding problems are NP-complete in general, we provide a comprehensive parameterized-complexity analysis with the aim of determining those properties of the problems that explain the hardness and those that make the problems tractable. For instance, we show that the problems remain hard for a small number d of features or small domain size D but the combination of both yields fixed-parameter tractability. That is, the problems are solvable in (D+1)2d |I|O(1) time, where |I|is the size of the input. We also provide a proof-of-concept implementation of this algorithm and report on empirical results.
The Complexity of Finding Local Optima in Contrastive Learning
The goal is to find representations (e.g., embeddings in Rd or a tree metric) where anchors are placed closer to positive than to negative examples. While finding global optima of contrastive objectives is NP-hard, the complexity of finding local optima--representations that do not improve by local search algorithms such as gradient-based methods--remains open. Our work settles the complexity of finding local optima in various contrastive learning problems by proving PLS-hardness in discrete settings (e.g., maximize satisfied triplets) and CLS-hardness in continuous settings (e.g., minimize Triplet Loss), where PLS(Polynomial Local Search) and CLS(Continuous Local Search) are well-studied complexity classes capturing local search dynamics in discrete and continuous optimization, respectively. Our results imply that no polynomial time algorithm (local search or otherwise) can find a local optimum for various contrastive learning problems, unless PLS P(or CLS P for continuous problems). Even in the unlikely scenario that PLS P(or CLS P), our reductions imply that there exist instances where local search algorithms need exponential time to reach a local optimum, even for d = 1(embeddings on a line).
k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
We propose a new initialization scheme for the k-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. We propose a novel and efficient search algorithm which finds initial centers that can be used subsequently for the local search algorithm. The so-called HST initialization method can produce initial centers achieving lower error than those from another popular method k-median++, also with higher efficiency when k is not too small. Our HST initialization are then extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error of applying DP local search followed by our private HST initialization improves prior results on the approximation error, and approaches the lower bound within a small factor. Experiments demonstrate the effectiveness of our proposed methods.
Unsupervised Learning for Solving the Travelling Salesman Problem
We propose UTSP, an Unsupervised Learning (UL) framework for solving the Travelling Salesman Problem (TSP). We train a Graph Neural Network (GNN) using a surrogate loss. The GNN outputs a heat map representing the probability for each edge to be part of the optimal path. We then apply local search to generate our final prediction based on the heat map. Our loss function consists of two parts: one pushes the model to find the shortest path and the other serves as a surrogate for the constraint that the route should form a Hamiltonian Cycle. Experimental results show that UTSP outperforms the existing data-driven TSP heuristics. Our approach is parameter efficient as well as data efficient: the model takes 10% of the number of parameters and 0.2% of training samples compared with Reinforcement Learning or Supervised Learning methods.